47 research outputs found

    The best multicore-parallelization refactoring you've never heard of

    Full text link
    In this short paper, we explore a new way to refactor a simple but tricky-to-parallelize tree-traversal algorithm to harness multicore parallelism. Crucially, the refactoring draws from some classic techniques from programming-languages research, such as the continuation-passing-style transform and defunctionalization. The algorithm we consider faces a particularly acute granularity-control challenge, owing to the wide range of inputs it has to deal with. Our solution achieves efficiency from heartbeat scheduling, a recent approach to automatic granularity control. We present our solution in a series of individually simple refactoring steps, starting from a high-level, recursive specification of the algorithm. As such, our approach may prove useful as a teaching tool, and perhaps be used for one-off parallelizations, as the technique requires no special compiler support

    Efficient Representations for Large Dynamic Sequences in ML

    Get PDF
    International audienceThe use of sequence containers, including stacks, queues, and double-ended queues, is ubiquitous in programming. When the maximal number of elements is not known in advance, containers need to grow dynamically. For this purpose, most ML programs either rely on lists or vectors. These structures are inefficient, both in terms of time and space usage. We investigate the use of chunked-based data structures. Such structures save a lot of memory and may deliver better performance than classic container data structures. We observe a 2x speedup compared with vectors, and up to a 3x speedup compared with lengthy lists

    Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations

    Get PDF
    Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a maximally dense format consequently serializes the processing of that data, yielding a tension between density and parallelism. This paper shows that a disciplined, practical compromise is possible. We present Parallel Gibbon, a compiler that obtains the benefits of dense data formats and parallelism. We formalize the semantics of the parallel location calculus underpinning this novel implementation strategy, and show that it is type-safe. Parallel Gibbon exceeds the parallel performance of existing compilers for purely functional programs that use recursive algebraic datatypes, including, notably, abstract-syntax-tree traversals as in compilers

    Contention in Structured Concurrency: Provably Efficient Dynamic Non-Zero Indicators for Nested Parallelism

    Get PDF
    International audienceOver the past two decades, many concurrent data structures have been designed and implemented. Nearly all such work analyzes concurrent data structures empirically, omitting asymptotic bounds on their efficiency, partly because of the complexity of the analysis needed, and partly because of the difficulty of obtaining relevant asymptotic bounds: when the analysis takes into account important practical factors, such as contention, it is difficult or even impossible to prove desirable bounds. In this paper, we show that considering structured concurrency or relaxed concurrency models can enable establishing strong bounds, also for contention. To this end, we first present a dynamic relaxed counter data structure that indicates the non-zero status of the counter. Our data structure extends a recently proposed data structure, called SNZI, allowing our structure to grow dynamically in response to the increasing degree of concurrency in the system. Using the dynamic SNZI data structure, we then present a concurrent data structure for series-parallel directed acyclic graphs (sp-dags), a key data structure widely used in the implementation of modern parallel programming languages. The key component of sp-dags is an in-counter data structure that is an instance of our dynamic SNZI. We analyze the efficiency of our concurrent sp-dags and in-counter data structures under nested-parallel computing paradigm. This paradigm offers a structured model for concurrency. Under this model, we prove that our data structures require amortized O(1) shared memory steps, including contention. We present an implementation and an experimental evaluation that suggests that the sp-dags data structure is practical and can perform well in practice

    Dag-calculus: a calculus for parallel computation

    Get PDF
    International audienceIncreasing availability of multicore systems has led to greater focus on the design and implementation of languages for writing parallel programs. Such languages support various abstractions for parallelism, such as fork-join, async-finish, futures. While they may seem similar, these abstractions lead to different semantics, language design and implementation decisions, and can significantly impact the performance of end-user applications. In this paper, we consider the question of whether it would be possible to unify various paradigms of parallel computing. To this end, we propose a calculus, called dag calculus, that can encode fork-join, async-finish, and futures, and possibly others. We describe dag calculus and its semantics, establish translations from the afore-mentioned paradigms into dag calculus. These translations establish that dag calculus is sufficiently powerful for encoding programs written in prevailing paradigms of parallelism. We present concurrent algorithms and data structures for realizing dag calculus on multi-core hardware and prove that the proposed techniques are consistent with the semantics. Finally, we present an implementation of the calculus and evaluate it empirically by comparing its performance to highly optimized code from prior work. The results show that the calculus is expressive and that it competes well with, and sometimes outperforms, the state of the art

    LoCal: a language for programs operating on serialized data

    Get PDF
    In a typical data-processing program, the representation of data in memory is distinct from its representation in a serialized form on disk. The former has pointers and arbitrary, sparse layout, facilitating easy manipulation by a program, while the latter is packed contiguously, facilitating easy I/O. We propose a language, LoCal, to unify in-memory and serialized formats. LoCal extends a region calculus into a location calculus, employing a type system that tracks the byte-addressed layout of all heap values. We formalize LoCal and prove type safety, and show how LoCal programs can be inferred from unannotated source terms. We transform the existing Gibbon compiler to use LoCal as an intermediate language, with the goal of achieving a balance between code speed and data compactness by introducing just enough indirection into heap layouts, preserving the asymptotic complexity of traditional representations, but working with mostly or completely serialized data. We show that our approach yields significant performance improvement over prior approaches to operating on packed data, without abandoning idiomatic programming with recursive functions

    La Grange Comprehensive Plan 2018 - 2038

    Get PDF
    In the Fall of 2017, the City of La Grange and Texas Target Communities partnered to create a task force to represent the community. The task force was integral to the planning process, contributing the thoughts, desires, and opinions of community members—as well as their enthusiasm about La Grange’s future. This fifteen-month planning process ended in August 2018. The result of this collaboration is the La Grange Comprehensive Plan, which is the official policy guide for the community’s growth over the next twenty years.La Grange Comprehensive Plan 2018 - 2038 provides a guide for the future growth of the City. This document was developed by Texas Target Communities in partnership with the City of La Grange.Texas Target Communitie
    corecore